Skip to content

Conversation

@nicu-da
Copy link
Contributor

@nicu-da nicu-da commented Jun 6, 2025

Pull Request Checklist

Cluster Testing

  • If a cluster test is required, comment /cluster_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.
  • If a hard-migration test is required (from the latest release), comment /hdm_test on this PR to request it, and ping someone with access to the DA-internal system to approve it.

PR Guidelines

  • Include any change that might be observable by our partners or affect their deployment in the release notes.
  • Specify fixed issues with Fixes #n, and mention issues worked on using #n
  • Include a screenshot for frontend-related PRs - see README or use your favorite screenshot tool

Merge Guidelines

  • Make the git commit message look sensible when squash-merging on GitHub (most likely: just copy your PR description).

@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 6, 2025

/hdm_test

2 similar comments
@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 6, 2025

/hdm_test

@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 6, 2025

/hdm_test

nicu-da added 14 commits June 6, 2025 12:48
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
Signed-off-by: Nicu Reut <[email protected]>
@nicu-da nicu-da force-pushed the nicuda/pulumi/operator_v2_upgrade branch from 62bc31b to b8c57bf Compare June 6, 2025 12:57
@nicu-da nicu-da marked this pull request as ready for review June 10, 2025 07:36
Copy link
Contributor

@moritzkiefer-da moritzkiefer-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you have a link to a successful HDM test?

I am slightly nervous about just replacing this. It seems ideally we would get a few weeks of experience running this in internal clusters instead of just forcibly rolling it out to dev/test/mainnet with the next upgrade.

How hard is it to keep this is an option that we only enable on internal clusters for now?


# Upgrade workarounds, include a GH issue to remove them once the base version changes

# TODO(#14679): Remove
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this stuff at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense indeed, let me clean it up

namespace: namespaceName,
},
spec: {
image: `pulumi/pulumi:${semver.gt(pulumiVersion, minimumPulumiVersionRequired) ? pulumiVersion : minimumPulumiVersionRequired}-nonroot`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this still go through our cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that yes as we basically cache everything as it always goes through that proxy. @isegall-da is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I think only the DinD is configured to use the cache, so that images we pull in "docker pull" in tests are pulled from there.
But I'm not sure we care. How often do we expect this to be re-pulled per cluster? (the rate limit is per IP AFAIK)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed the cache for jobs in CI that were pulling multiple images per each CI run, hence hit rate limits on docker.io

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure we care. How often do we expect this to be re-pulled per cluster? (the rate limit is per IP AFAIK)

I guess ciperiodic might be the most with one pod per stack? Hopefully still infrequent enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate limit seems to be 100 per IP address per 6 hours. I think we should be ok. Famous last words...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think that k8s has some internal cache too, doesn't it? (didn't help with docker pull in DinD, but will for the image for a pod)

@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 10, 2025

Thanks! Do you have a link to a successful HDM test?

I have one where the steps that were touched by this did succeed but the full workflow failed because it was during the switch to splice.
I will trigger a new one though to test it out with the new repo.

I am slightly nervous about just replacing this. It seems ideally we would get a few weeks of experience running this in internal clusters instead of just forcibly rolling it out to dev/test/mainnet with the next upgrade.

How hard is it to keep this is an option that we only enable on internal clusters for now?

It's actually fairly simple as we can keep the operator deployment on an old reference if needed. Though I wouldn't postpone it more than required tbh. Once it's merged to main updating CILR should provide the feedback we need in a few days of testing.

@moritzkiefer-da
Copy link
Contributor

It's actually fairly simple as we can keep the operator deployment on an old reference if needed. Though I wouldn't postpone it more than required tbh. Once it's merged to main updating CILR should provide the feedback we need in a few days of testing.

fair, I guess we could just revert if needed reasonably easily as well.

Copy link
Contributor

@moritzkiefer-da moritzkiefer-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Signed-off-by: Nicu Reut <[email protected]>
@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 10, 2025

/hdm_test

1 similar comment
@nicu-da
Copy link
Contributor Author

nicu-da commented Jun 10, 2025

/hdm_test

Signed-off-by: Nicu Reut <[email protected]>
cpu: 5,
memory: config.optionalEnv('OPERATOR_MEMORY_LIMIT') || '20G',
cpu: 2,
memory: config.optionalEnv('OPERATOR_MEMORY_LIMIT') || '4G',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now per stack, which is why we need 80% less, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update it as well after we deploy to prod and see actual usage but the main container should be fairly light

Copy link
Contributor

@isegall-da isegall-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@nicu-da nicu-da merged commit 3fce9d9 into main Jun 10, 2025
41 checks passed
@nicu-da nicu-da deleted the nicuda/pulumi/operator_v2_upgrade branch June 10, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants